A  Stochastic  Optimization  Algorithm 
Using  Intelligent  Agents 

A  Program  with  Constraints  and  Rate  of 
Convergence 

Bao  U.  Nguyen 
Complexity  Team 


DRDC  CORA  TM  2010-249 
November  2010 


Defence  R&D  Canada 

Centre  for  Operational  Research  &  Analysis 


l+l 


National  Defense 
Defence  nationale 


Canada 


A  Stochastic  Optimization  Algorithm  using 
Intelligent  Agents 

A  Program  with  Constraints  and  Rate  of  Convergence 


Bao  U.  Nguyen 
DRDC  CORA 


Defence  R&D  Canada  -  CORA 

Technical  Memorandum 
DRDC  CORA  TM  2010-249 
November  2010 


Principal  Author 

Original  signed  by  Bao  U.  Nguyen,  Ph.D. 
Bao  U.  Nguyen,  Ph.D. 
Defence  Scientist 

Approved  by 

Original  signed  by  D.M.  Bergeron,  Ph.D. 

D.M.  Bergeron,  Ph.D. 

Head  Air  Section 

Approved  for  release  by 

Original  signed  by  P.  Comeau 
P.  Comeau 
Chief  Scientist 


Defence  R&D  Canada  -  Centre  for  Operational  Research  and  Analysis  (CORA) 


©  Her  Majesty  the  Queen  in  Right  of  Canada,  as  represented  by  the  Minister  of  National  Defence,  2010 


©  Sa  Majeste  la  Reine  (en  droit  du  Canada),  telle  que  representee  par  le  ministre  de  la  Defense  nationale, 
2010 


Abstract 


The  problem  of  optimizing  the  average  time  latency  of  a  network,  using  agents  that  are  able  to 
learn,  is  examined  in  this  paper.  The  network  design  is  constrained  by  a  traffic  matrix  that 
dedicates  specific  flows  between  specific  pairs  of  nodes.  Although  this  is  an  application  type 
of  analysis,  only  the  methodology  is  presented  here,  which  includes  an  algorithm  for 
optimization  and  a  corresponding  conservative  rate  of  convergence  based  on  no  learning.  The 
application  part  will  be  presented  in  the  near  future  once  data  are  available.  It  is  expected  that 
the  tools  developed  in  this  paper  can  be  used  to  optimize  a  wide  range  of  objective  functions 
that  do  not  necessarily  have  to  be  the  time  latency.  For  example,  it  could  be  the  cost  of  the 
network. 


Resume 


Le  probleme  de  1’ optimisation  du  temps  de  latence  moyen  d'un  reseau  au  moyen  d'agents 
capables  d'apprentissage,  est  examine  dans  le  present  document.  La  conception  du  reseau  est 
contrainte  par  une  matrice  de  trafic  qui  etablit  des  flux  particuliers  entre  des  paires  de  nceuds 
particulieres.  Bien  qu'il  s'agisse  d'un  type  de  mise  en  application  d'analyse,  seulement  les 
methodologies  sont  presentees  ici,  y  compris  un  algorithme  d'optimisation  et  un  taux  de 
convergence  correspondant  raisonnable  fondes  sur  un  modele  sans  apprentissage.  La  mise  en 
application  sera  presentee  prochainement,  une  fois  les  donnees  disponibles.  II  est  espere  que 
les  outils  decrits  dans  le  present  document  permettront  d'optimiser  une  vaste  gamme  de 
fonctions  objectifs  enplus  de  la  latence. 
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Executive  summary 


A  Stochastic  Optimization  Algorithm  using  Intelligent  Agents: 
With  Constraints  and  Rate  of  Convergence 

Bao  U.  Nguyen;  DRDC  CORA  TM  2010-249;  Defence  R&D  Canada  -  CORA; 
November  2010. 

Background:  The  power  of  computers  nowadays  allows  us  to  examine  problems  where  close 
form  solutions  do  not  necessarily  exist,  but  which  can  be  solved  using  efficient  algorithms.  In 
addition,  the  models  based  on  algorithms  can  simulate  a  level  of  detail  that  close  form 
solutions  often  cannot.  In  this  Technical  Memorandum  (TM),  one  such  algorithm  is  used  to 
model  a  network  such  as  the  one  implemented  in  the  Networked  Underwater  Warfare 
Technology  Demonstration  Program  (Ref  [1])  that  was  conducted  at  DRDC  Atlantic. 

Results:  The  current  algorithm  consists  of  intelligent  agents  who  can  learn.  The  learning 
process  of  the  agents  leads  to  optimization  of  an  objective  function  that  is  subject  to  a  number 
of  constraints.  The  objective  function  was  chosen  to  be  the  average  time  latency  of  a  network, 
and  the  constraints  to  be  the  traffic  matrix.  The  aim  is  to  minimize  the  average  time  latency 
while  maintaining  dedicated  flows  among  pairs  of  nodes  that  form  a  network.  A  node  can  be 
a  sensor,  a  ship,  an  aircraft,  a  submarine,  etc.  However,  this  algorithm  is  general  in  the  sense 
that  it  can  optimize  other  objective  functions  that  are  not  the  time  latency  and  can  model  other 
types  of  constraints  that  are  not  the  traffic  matrix. 

Significance:  This  report  describes  a  novel  optimization  algorithm  as  well  as  the  rate  of 
convergence  for  the  algorithm.  This  is  a  new  theoretical  development  for  heuristic  algorithms 
that  simulate  Markov  processes.  It  gives  the  Operations  Research  practitioner  a  valuable  tool 
to  determine  how  good  an  algorithm  is  and  how  optimal  the  solution  is.  To  the  best  of  the 
author’s  knowledge,  such  a  convergence  criteria  is  not  available  in  the  open  literature.  It  is 
hoped  that  the  reader  will  make  use  of  this  type  of  agent-based  algorithm  and  the 
corresponding  convergence  rates  in  the  application  of  heuristic  optimization  algorithms. 
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A  Stochastic  Optimization  Algorithm  using  Intelligent  Agents: 
With  Constraints  and  Rate  of  Convergence 

Bao  U.  Nguyen;  DRDC  CORA  TM  2010-249;  R  &  D  pour  la  defense  Canada  - 
CARO;  Novembre  2010. 

Contexte:  La  puissance  des  ordinateurs  nous  permet  aujourd'hui  d'etudier  des  problemes  pour 
lesquels  une  solution  analytique  n'existe  peut-etre  pas,  mais  qui  peuvent  etre  resolus  au  moyen 
d'algorithmes  efficaces.  De  plus,  les  modeles  fondes  sur  les  algorithmes  peuvent  simuler  un 
niveau  de  detail  que  les  solutions  analytiques  ne  peuvent  pas  toujours  atteindre.  Le  present 
document  technique  (TM)  decrit  un  algorithme  de  ce  genre  utilise  pour  modeliser  un  reseau 
tel  que  celui  mis  en  oeuvre  dans  le  projet  de  demonstration  de  technologies  sur  la  guerre  sous- 
marine  en  reseau  (Ref  [  1  ])  qui  avait  ete  realise  a  RDDC  Atlantique. 

Resultats:  Le  present  algorithme  est  constitue  d'agents  intelligents  capables  d'apprentissage. 
Le  processus  d'apprentissage  des  agents  permet  d'optimiser  une  fonction  objectif  assujettie  a 
un  certain  nombre  de  contraintes.  La  fonction  objectif  choisie  est  le  temps  de  latence  moyen 
d'un  reseau  et  les  contraintes  sont  obtenues  de  la  matrice  de  trafic.  Autrement  dit,  Tobjectif 
est  de  minimiser  le  temps  de  latence  moyen  tout  en  maintenant  les  flux  entre  les  paires  de 
noeuds  qui  forment  le  reseau.  Un  nceud  peut  etre,  par  exemple,  un  capteur,  un  navire,  un 
aeronef,  un  sous-marin.  II  n’en  reste  pas  moins  que  Talgorithme  est  general :  il  peut  etre 
utilise  pour  optimiser  des  fonctions  objectifs  autres  que  la  latence  et  pour  modeliser  des  types 
de  contraintes  autres  que  la  matrice  de  trafic 

Importance:  Ce  rapport  decrit  non  seulement  un  algorithme  d'optimisation  novateur,  mais 
aussi  le  taux  de  convergence  de  l'algorithme.  II  s'agit  d'un  nouveau  developpement  theorique 
des  algorithmes  heuristiques  qui  simulent  les  processus  de  Markov.  Le  praticien  en  recherche 
operationnelle  obtient  ainsi  un  precieux  outil  pour  evaluer  la  qualite  de  son  algorithme  et  pour 
determiner  si  sa  solution  est  optimale.  A  la  connaissance  de  Tauteur,  aucun  autre  critere  de 
convergence  similaire  n'est  encore  disponible  dans  les  sources  publiees.  II  est  espere  que  le 
lecteur  mettra  a  profit  ce  type  d'algorithme  fonde  sur  les  agents  et  les  taux  de  convergence 
correspondants  dans  la  mise  en  application  d'algorithmes  d'optimisation  heuristique. 
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1  Background 


In  support  of  a  Technology  Investment  Fund  (TIF)  project  at  DRDC  CORA  on  Network 
Centric  Warfare  (Ref  [2]),  the  time  latency  of  a  generic  network  that  is  subject  to  a  number  of 
constraints,  is  determined.  The  time  latency  of  a  network  can  be  determined  in  many  ways; 
however,  this  paper  will  describe  a  methodology  on  how  to  optimize  the  time  latency  using 
agents  that  have  the  ability  to  learn.  The  main  results  of  this  paper  are  to  provide  an  algorithm 
to  do  so  and  to  derive  the  rate  of  convergence  of  the  corresponding  algorithm  when  no 
learning  is  in  effect.  The  methodology  is  inspired  from  Ref  [3],  which  provides  a  heuristic 
algorithm  to  optimize  a  cost  objective  function.  Although  part  of  the  material  in  this 
Technical  Memorandum  (TM)  has  been  published  in  Ref  [4],  the  report  includes  an 
improvement  in  the  convergence  rate  of  the  optimization  algorithm  and  a  more  complete 
proof  of  this  convergence  rate. 


Figure  1:  An  Example  of  a  Defence  Network. 


2  Learning  Algorithm 


This  section  examines  a  communication  network  that  has  a  globally  maximal  capacity  C  .  The 
network  flows  must  satisfy  the  traffic  matrix  (/„„).  That  is,  there  will  be  a  dedicated  flow 
from  node  u  to  node  v  that  is  greater  than  or  equal  to  yuv .  This  ensures  that  node  u  can 
communicate  with  node  v  with  the  desired  flow  ym.  In  addition,  the  time  latency  of  the 
network  depends  on  both  the  flow  and  the  capacity  of  each  link.  This  development  makes  use 
of  the  algorithm  in  Ref  [3],  which  is  described  below. 
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The  capacities  of  the  links  are  represented  by  a  vector  [cuc2,...,ce)  where  e  is  the  number  of 
edges  or  links  in  the  network,  cj  is  chosen  from  a  finite  set  of  capacities  such  as  (0,1,2,...) 
where  a  capacity  unit  corresponds,  for  example,  to  1200  bps  (bits  per  second).  For  each  link  i 
and  each  possible  capacity  j ,  there  is  a  triplet  (fij,Sij,Dj'j  where  /,,  is  the  probability  that  the 
current  capacity  j  of  the  link  i  should  be  increased,  S!/  is  the  probability  that  the  current 
capacity  j  of  the  link  i  should  remain  unchanged,  and  Z>  is  the  probability  that  the  current 
capacity  j  of  the  link  i  should  be  decreased.  The  capacity  c ,  of  a  link  i  is  modelled  as  an 
agent  whose  learning  process  is  encoded  in  the  evolution  of  the  triplet  ( T ,  Sy ,  Z> )  where 

Iij+Sij+Dij=l- 

The  final  solution  vector  will  consist  of  the  capacities  ci  such  that  S„  probability  values 

approach  unity  e.g.  0.99 .  The  closer  this  value  is  to  unity,  the  more  accurate  is  the  solution. 
This  is  so  that  as  the  optimal  solution  is  approached,  the  algorithm  favours  the  current 
solution,  hence  Sy  tends  to  unity.  As  is  often  the  case  with  heuristic  algorithms,  Ref  [3]  did 

not  provide  the  rate  of  convergence  of  their  algorithm.  Fortunately,  it  is  possible  to  derive  the 
rate  of  convergence  for  this  algorithm,  at  least  in  the  case  where  agents  do  not  learn. 
However,  before  presenting  the  derivation  for  the  rate  of  convergence,  the  pseudo-code  in  Ref 
[3]  is  shown  below.  The  algorithm  can  be  divided  into  three  modules.  Module  1  initializes 
the  triplet  ( A ,  Sy ,  Z>  ) ,  looks  for  a  feasible  solution,  and  determines  its  value  as  dictated  by  the 

objective  function.  Module  2  searches  the  solution  space.  Module  3  updates  the  triplet 

Module  1 .  Initialize  the  triplet  (/.. , Sjy ,  Z>. ) . 

For  ( i  =  1  to  maxlinks  (=e) ) 

For  ( j  =  1  to  maxcaps  (=C) ) 

If  (./  =  !  (left-boundary-state)) 

Ij  =1/2,5,  =1/2,  Dy  =  0 

End-If 

If  (  j  =  maxcaps  (right-boundary-state)) 

7,=°,5,=1/2,  Dy  =1/2 

End-If 

If  ( 1  <  j  <  C  (internal-state)) 

o  =  l/3,  I  ij  =  a  ,Sy  =  a  ,  Dy  =  a 

End-If 

End-For 

End-For 

Repeat 

For  ( i  =1  to  maxlinks ) 
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ci  =  RAND  (0,  maxcaps) 


End-For 

Until  (network  is  feasible) 
current-objective  =  calculate-objective() 

For  ( i  =  1  to  maxlinks  ) 

best  -  ct  =  c,. 

End-For 

best-objective  =  current-objective() 

Module  2.  Search  the  solution  space. 

While  (count<num-iterations)  and  (accuracy-level  (all  links)  <  required  accuracy) 
For  (1  =  1  to  maxlinks ) 

Actiorii  =  RAND  ( Increase y ,  Stay ,  Decrease ) 

If  (  Action i  =  Increase. ..  ) 
c,  =  c,.  + 1 

End-If 

If  ( Actioni  =  Decrease t] ) 
c,  =c,  -1 

End-If 

current-objective  =  calculate-objective() 

End-For 

For  ( i  =  1  to  maxlinks ) 

j  = 

If  (network  is  feasible) 

If  ( Actioni  =  Increase. :j ) 

Raise  (D0,A,r1) 

End-If 

If  ( Actioni  =  Stayij ) 

Raise  lsl) 

End-If 

If  ( Actioni  =  Decrease „ ) 

Raise  (DirXR]) 

End-If 

Else 

Reset  all  links  to  best-objective  capacities 

End-If 

If  (network  is  feasible)  and  (current-objective  <  best-objective) 

If  ( Actioni  =  Increase .. ) 
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Raise  ( D/; ,  AR2 ) 

End-If 

If  ( Actioni  =  Stayj ) 

Raise  (Sy,  AR2) 

End-If 

If  (  Actioni  =  Decrease ) 

Raise(Sl],AR1) 

End-If 

For  ( i  =  1  to  maxlinks ) 
best  -  ci  =  c, 

End-For 

best-objective  =  current-objective() 

End-If 
End-For 
End- While 

Module  3.  Procedure  Raise.  EJpdating  the  triplet  ( / . . ,  S;j .  Z>. .  j .  AR  =  Am  is  associated  with  a 
new  feasible  solution.  AR  =  A;,,  is  associated  with  a  new  feasible  solution  that  is  also  superior. 

If  (  Action  =  Increase) 

4,  =  4  •  Dj ;  S:J  =  Ar  ■  StJ ;  /,  = 1  ■ -  ( Z),  +  5, ) 

End-If 

If  (Action  =  Stay) 

4  =  ^ -4 ;  4,  =  4 -4, ;  4 ->  (4  •  />  ) 

End-If 

If  (Action  =  Decrease) 

4  =  4  •  4 ;  4  =  4  •  4 ;  4,  = 1  -  ( 4  +  4 ) 

End-If 


3  Algorithm  Extension 


The  capacity  assignment  is  modelled  in  the  same  way  as  that  of  Oommen  and  Roberts  2000 
(see  previous  section).  Hence,  there  is  a  triplet  [4444  4y  ^ )  associated  with  link  i  and 
capacity  j  ,  and  the  superscript  c  stands  for  capacity.  In  addition,  each  path  /  is  modelled  in 
a  similar  way  as  an  agent  that  carries  k  units  of  flow  and  which  connects  node  u  to  node  v. 
Each  path  type  agent  is  represented  by  a  triplet  ( >  4w!  >  )  where  the  superscript  p 

stands  for  path.  These  triplets  are  updated  at  each  run  depending  on  random  numbers  and 
whether  the  objective  function  is  improved  or  not.  For  example,  if  the  objective  function  is 
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improved  then  both  Sy  and  S'U'’J:  are  increased.  This  new  algorithm  is  the  first  main  result  of 

this  paper  and  can  be  used  to  optimize  the  average  time  latency  of  a  network,  as  defined  in 
Ref  [5]: 


(1) 

k  Ck-fk 


where  (m,v)  represents  node  u  and  node  v;  each  k  represents  a  link  while  fk  and  ck  are 

respectively  the  flow  and  capacity  of  that  link.  Enumeration  is  used  to  generate  all  possible 
paths  that  connect  node  u  to  node  v.  Modelling  each  path  as  an  agent  ensures  flow 
conservation  through  each  node.  The  flow  through  a  link  is  then  equal  to  the  sum  of  the  flows 
of  all  paths  that  traverse  that  link.  For  example,  let  { a,b,c,d,e }  be  the  set  of  nodes  of  a 

complete  graph  (all  possible  links)  as  shown  in  Figure  2.  Let’s  consider  the  link  ( a-b); 
pathl  be  ( a-b-c )  with  2  units  of  flow,  path2  be  ( a-b-d )  with  1  unit  of  flow  and  path3  be 
(c-a-b)  with  3  units  of  flow.  The  flow  through  the  link  ( a-b )  will  be  the  sum  of  2  +  1  +  3 
as  each  of  the  three  paths  traverse  [a-b) .  Note  that  the  flow  of  a  link  ranges  from  zero  to  C 
(the  maximal  capacity  of  the  network). 
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4  Rate  of  Convergence  of  Flows  in  a  Graph 


Define  the  rate  of  convergence  in  a  similar  way  to  that  in  Ref  [6].  That  is,  the  probability  that 
a  globally  optimal  state  is  found  at  least  once.  The  rate  of  convergence  derived  below 
assumes  no  learning.  Since  the  purpose  of  learning  is  to  accelerate  the  convergence  of  the 
algorithm,  it  is  expected  that  this  rate  of  convergence  is  a  conservative  estimate  of  the 
algorithm.  Observe  that  for  a  link,  the  flow  through  that  link  is  regulated  by  a  Markov  chain 
as  shown  in  Fig.  3.  Each  flow  state  is  labelled  by  a  number.  For  example,  the  label  zero 
indicates  that  the  flow  is  equal  to  zero  (unit  of  flow)  while  the  label  one  indicates  that  the  flow 
is  equal  to  one  (unit  of  flow)  etc.  The  connections  between  the  labels  show  the  transitions 
among  the  states.  For  example,  the  link  from  flow  zero  to  flow  one  represents  the  probability 
that  flow  zero  makes  a  transition  to  flow  one.  The  link  from  flow  one  to  flow  one  represents 
the  probability  that  flow  one  remains  flow  one. 


Figure  3:  A  Markov  Chain. 


The  Markov  chain  shown  in  Fig.  3  can  be  represented  by  a  transition  matrix  P  where  Py  is 
the  probability  that  state  i  transitions  into  state  j  .  For  example,  given  C  =  4 ,  we  get: 


0 

1 

2 

3 

C=  4 


0  12  3  4 
"0  1  0  0  0" 

a  a  a  0  0 

0  a  a  a  0 

0  0  a  a  a 

v0  0  0  1  0y 


(2) 


where  a  =  1/3.  Model  P  that  way  so  that  the  state  with  flows  equal  to  zero  or  C  are  not 
considered  feasible.  If  a  flow  of  a  link  is  equal  to  zero,  then  there  is  no  communication 
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necessary.  Hence,  the  probability  of  transition  from  state  zero  to  state  one  is  set  to 
100  percent.  If  a  flow  of  a  link  is  equal  to  C ,  then  there  is  no  capacity  left  for  the  remaining 
links,  which  can  happen  only  when  there  is  only  one  link  in  the  network.  However,  the 
network  considered  here  consists  of  many  links.  Hence,  the  probability  of  transition  from 
state  C  to  state  C- 1  is  also  set  to  100  percent.  Additionally,  for  j  =  1,...,C-1 ,  given  link  i , 
the  probability,  P  ;  l ,  that  state  j  transitions  to  state  j  - 1  is  equal  to  Di  .  =  a  ;  the  probability, 
Pj  j ,  that  state  j  stays  the  same  is  equal  to  St  j=a;  and  the  probability,  P  .+1 ,  that  state  j 
transitions  to  state  /  +  I  is  equal  to  t .  =  a  .  For  example,  Pl0  =  Ptl  =  Pl2  =  a  =  Dn  =  Sil  =  Iit 
while  all  other  transitions  from  state  1  are  forbidden. 


4.1  Lemma  1 


The  probability  that  the  flow  through  a  path  is  optimal  is  given  by: 

P) 

where  a  =  1/3  ,  n  is  the  number  of  iterations,  P,  is  the  transition  matrix  associated  with  the 
flow  of  a  path;  Qf  is  equal  to  Pf  with  the  exceptions  that  the  first  row,  first  column,  last  row 
and  last  column  elements  are  set  to  zero;  and  \Qfn\  is  the  sum  of  all  elements  of  Q" .  That  is, 

le/hZ  S(e/)u 

1  j 


For  example,  given  C  =  4 ,  we  get: 


0 

0 

0 

0 

0 

0 

a 

a 

0 

0 

0 

a 

a 

a 

0 

0 

0 

a 

a 

0 

0 

0 

0 

0 

0 

\Qf\  =  l-a 


(4) 


Note  that,  P,  and  Q,  do  not  change  with  n .  This  is  so  because  there  is  no  learning. 
However,  when  learning  is  in  effect,  the  elements  of  P,  and  Q,  are  updated  through  the 
triplets  ( I,S,D ). 
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4.2  Proof  of  Lemma  1 


To  alleviate  the  notation,  suppress  the  subscript  /  associated  with  P  and  Q  .  Let  the  optimal 
flow  be  /* .  We  wish  to  determine  the  probability  that  a  random  and  non-optimal  flow  i 
transitions  to  another,  also  random  and  non-optimal,  flow  j .  The  probability  of  picking  a 
random  flow  such  that  i  *  /*  is: 


P(flow  =  i^f*\=(C  2) 

1  ’  (C-l) 

For  example,  given  C  =  4,  then  there  are  five  possible  states  belonging  to  {0,1, 2, 3, 4}  .  A 

random  flow  can  be  equal  to  any  of  these  five  states.  However,  if  state  zero  and  state  four  are 
discarded  as  non  feasible  states,  there  remain  only  three  states.  Hence  the  probability  of 

picking  a  random  and  feasible  flow  i  is  — - — -  =  1/3  .  Further,  let  /*  =  3 ,  then  the  probability 

(C-l) 

(C-2) 

that  a  random  and  feasible  flow  is  not  optimal  is  — — jy  =  2  /  3  .  Asa  result,  the  probability  of 
picking  at  random  a  flow  i  and  that  flow  i  is  feasible  and  not  optimal  is 
(C~2)  =(l/3)-(2/3)  =  2/9. 

(c-iy  k  M  ; 


The  probability  of  starting  with  a  feasible  flow  /  *  f  and  ending  up  with  another  feasible 
flow  j  *  /* ,  is  regulated  by  the  transition  matrix  P  .  That  is, 


(C-2) 

(c-i)2 


■p>.j 


(5) 


The  probability  of  starting  with  any  feasible  flow  /  *  /*  and  ending  up  with  any  other  feasible 
flow  j  ^  /*  is  the  sum  of  the  above  expression  over  i  and  j  such  that  /,  /  ^  /  i.e. 


(!-«)• 


(C-2) 

(C-l)2 


■Z/. 


Furthermore,  observe  that 


i,j*f  iJ 


since  all  the  elements  of  the  transition  matrix  P  are  non  negative  and  the  LHS  sums  over  all 
elements  i  and  j  such  that  i,  j  ■*-  f  while  the  RHS  sums  over  all  elements  i  and  j  with  no 
restrictions.  That  is,  the  sum  on  the  RHS  includes  more  elements  of  P  than  the  sum  on  the 
LHS.  Therefore, 
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(1  -a) -^-4-  £  p 

K  ’  (c-i)2  ,,£•  ,j  ^  ;  (c-iy  tr 

Additionally,  the  RHS,  above,  sums  over  the  states  that  are  feasible  and  non-optimal,  which 
implies  that  i,j  v  0.  C .  This  can  be  interpreted  in  a  way  such  that  the  RHS  above  includes  all 
transitions  from  a  state  i  that  is  feasible  to  a  state  j  that  is  also  feasible.  Therefore,  any 
transitions  from  (to)  0  or  C  to  (from)  a  feasible  state  is  forbidden.  This  is  equivalent  to 
replace  PtJ  by  QtJ .  If  this  argument  is  repeated  n  times,  we  get  an  upper  bound  for  the 
probability  qf  of  not  achieving  the  optimal  state  after  n  iterations: 

’  '  '  (c- 1)  tr  '  (c-i) 

Therefore,  the  lower  bound  to  the  probability  pf  =\-qf  of  achieving  the  optimal  state 
satisfies: 

^1-<1-“)wie"i 


4.3  Lemma  2 


The  probabilistic  bound  in  Eqn.  (3)  of  finding  the  optimal  flow  is  an  increasing  function  of  n  . 


4.4  Proof  of  Lemma  2 

This  is  true  as 

Q"+1  =Qn -Q  =  Q"  \P-  A) 

where  A  =  P-Q .  Simple  algebra  dictates  that 

\Q"-p\  =  \Q"\ 

Hence, 

b"+1 1  =  \q"  .  p|  -  |g” .  a|  =  \q“  I  - 1(?"  •  a| 
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Since  Q"  -  A  has  only  positive  and  zero  elements,  this  means  that  |  Q”  •  A  >  0  .  Therefore, 

|e"+1[<|ei 


As 

p/>i-(c-2)/(c-i)2-|en| 

This  implies  that  pf  is  an  increasing  function  of  n  .  For  example,  given  C  =  4  , 
|0-A|  =  4-a2 


Flence, 

\Q2\  =  \Q\-4-a2  <\Q\ 


4.5  Lemma  3 


The  probability  that  the  capacity  of  a  link  is  optimal  is  given  by 


Pc 


>  min 

C=l,...C 


l-(l-a)- 


(C~  2) 

(C-l)2 


(6) 


Note  that  pc  is  the  probability  of  achieving  the  optimal  capacity  and  Q,  =  Qf .  Flowever,  if 
we  assume  that  learning  occurs,  then  Qf  and  Qc  will  change  as  a  function  of  n ,  in  which 

case  they  will  not  necessarily  evolve  in  the  same  way.  Observe  that  the  capacity  of  a  link 
must  be  greater  than  the  flow  through  that  link  since  otherwise  the  average  time  latency 
shown  in  Eqn.  (1)  is  ill  defined.  If  a  flow  is  j ,  then  the  capacity  ranges  from  /  + 1  to  C  .  If 
the  capacity  shifts  to  the  left  by  / ,  we  get  c  ranging  from  one  to  C-j.  Since  j>  0,  the 
largest  value  for  c  is  C  .  The  fact  that  the  largest  value  of  c  is  C  and  that 


(c-1) 


\Qc+i  * 


(c~2) 

(C-1)2 


for  all  c ,  as  proved  below,  allows  us  to  assert  Lemma  3. 
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4.6  Proof  of  Lemma  3 


Proceed  by  induction  on  n  .  The  first  step  is  to  prove  that  the  lemma  holds  for  n  =  1 .  That  is, 


Since 


(c-i)  (c-2) 


(c)  (c-1) 

established  by  observing  that  \Qc  |  =  (c  -  3  +  4 / 3) . 


we  only  need  to  show  that  --|gc+1|>  1  -  |gc | .  This  can  be 


(c-1) 


Now,  assume  that  this  is  true  for  all  k  =  1 ,...,«  and  prove  for  n  +  1 .  We  will  show  that  this  is 
true  when  c  is  odd.  A  similar  proof  can  be  shown  when  c  is  even. 


Let  e;  = 


\dU 


where  d"  is  the  ith  column  of  Q"  and  Q“+]  = 


— n  —  n  —  n 

e\  ,ei,...,ec 


where  e" 


is  the  ith  column  of  Q"+l.  Note  that  the  boundary  rows  and  columns  of  Q“  and  Q"+1 ,  whose 
elements  are  zeroes,  were  removed.  This  will  not  affect  the  proof.  Q"  obeys  a  recursion: 


Q':-Qc=a 


— ;rt  — j 


— ;n  —tn  — *  n  — *  n 


d i  d 2\d\  d 2  ~\~ d 3 ,d 2  +d 3  +  d &>\...\d c-2  -\-dc-i 


(7) 


Q”+l  obeys  a  similar  recursion  to  the  one  above. 


4.6.1  Case  1:  Assume  that  c  is  odd 


Applying  recursion  and  induction  repeatedly,  we  get  p  inequalities  where  p  =  —  . 

Each  time,  the  new  inequality  is  obtained  by  removing  the  first  term  and  the  last  term 
on  both  the  LHS  and  RHS  of  the  previous  inequality. 


-n-l 

ei 


—n-l 

\  1  / 

—n-l 

—n-l 

+  ...+ 

ec-\ 

/  c-1  ( 

d  2 

+  ...+ 

d  c-2 

1  / 

— n-2 

—n-2 

\  1  1 

—n-2 

—  n-2 

— 

€3 

+  ...+ 

ec-i 

> - ,  • 

d  3 

+...+ 

dc-3 

C  V 

/  c-1  \ 
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1 

c 


( 

-n-p 

-n-p 

“ n-p 

( 

Cp 

+ 

e  p+i 

+ 

e  p+2 

1 

> - 

c- 1 


l  p+1 


t  p+2 


c 


/ 

- n-p-l 

—n-p-l 

-n-p- 1 

( 

eP 

+  2  • 

C  p+1 

+ 

e  p+2 

1 

> - 

c  —  1 


+ 


(8) 

(9) 


where  pc  is  the  sum  of  all  elements  of  the  vector  x .  The  lemma  is  true  if  the  first 

inequality  is  true.  But  the  first  inequality  is  true  if  the  second  inequality  is  true. 
Repeating  the  argument  tells  us  that  the  lemma  is  true  if  the  last  inequality  is  true. 
Again,  by  induction,  Eqn.  (8)  is  true  when  replacing  n-p  by  n-p- 1 : 


1 

c 


—n-p-l 

eP 


—n-p-l 
C  p+1 


—n-p-l 

eP+i 


c-1 


„-p-i 


Cl  p+i 


Combining  Eqns.  (9)  and  (10),  we  get: 


1 

c 

1 


-n-p-l 


+  2' 


—n-p-l 

eP 


~n-P- 1 

^P  +  l 


-n-p-l 

€  p+1 


—n-p-l 

&  p+2 


-n-p-l 

€  p+2 


c-1 


—n-p-l 
U  p 


d"PT' 


-jn-P-l 
Cl  p+2 


(10) 


Hence  the  proof  is  complete  because  we  have  shown  that  the  inequality  (9)  is  true. 


4.6.2  Case  2:  Assume  that  c  is  even 


Applying  recursion  and  induction  to 


\q:+\  >  |  q" 

c  c-1 


to  get  p  inequalities  where  p  =  — . 


Use  the  same  technique  as  in  case  1. 
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1  / 

-n- 1 

1 

\  i  / 

—  n-1 

— «-l 

~A 

e2 

+...+ 

ec-\ 

/  c-l  ( 

d  2 

+  ...+ 

dc-2 

1  i 

— n-2 

—n-2 

\  1  ( 

—  n-2 

—  n-12 

— 

e-2 

+  ...+ 

ec-i 

- - T 

d  3 

+  ...+ 

dc-2 

C  \ 

!  c-l  \ 

1  / 

~n-p 

~n-p 

\  l  / 

—  n-p 

— 

eP 

+ 

€  p+ 1 

-  , ' 

dp 

c  \ 

/  c-l  \ 

The  inequalities  above  are  true  if  and  only  if: 


1  / 

— n-p-1 

—n-p-l 

c  ( 

eP 

+ 

€  p+ 1 

But  the  above  is  true  as  all  elements  of  the  matrices  Q"  are  non  negative.  Hence  the 
proof  is  complete. 

4.7  Corollary  of  Lemma  3 


Combining  the  result  of  Lemma  1  to  that  of  Lemma  3,  we  obtain  the  second  main  result  of 
this  paper,  the  lower  bound  to  the  probability  of  finding  the  optimal  solution  in  a  network  that 
has  e  links  and  s  paths  between  all  pairs  of  nodes: 


P  - 


l-(l-a) 


j  v 

x  e+s 


(C- 2) 
(C-l) 


Qc 


2  XC 


(12) 


4.8  Proof  of  Corollary 


The  probability  of  finding  the  optimal  solution  is  the  product  of  the  probability  that  each  path 
carries  the  optimal  flow  (provided  by  Eqn.  (3))  and  the  probability  that  each  link  has  the 
optimal  capacity  (provided  by  Eqn.  (6)).  That  is,  following  the  inequality  sign  in  Eqn.  (12), 
the  first  factor  is  the  lower  probabilistic  bound  of  finding  the  optimal  flows  where .?  is  the  total 
number  of  paths  between  all  pairs  of  nodes,  while  the  second  factor  is  the  lower  probabilistic 
bound  of  finding  the  optimal  capacities  for  e  links. 
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4.9  Example 


Figure  4  below  shows  that  the  probability  of  achieving  the  optimal  solution  increases  as  a 
function  of  number  of  runs.  This  is  so  as  the  larger  the  number  of  runs,  the  higher  the 
probability  of  achieving  the  optimal  solution.  C-  20  (C  =  21  and  C  =  22)  means  the  flow 
through  each  link  ranges  from  zero  to  twenty  (zero  to  twenty  one  and  zero  to  twenty  two). 
Note  that  as  C  decreases,  the  search  space  decreases  and  hence  it  is  easier  to  find  the  optimal 
solution  and  therefore  the  probability  of  achieving  the  optimal  solution  increases.  Figure.  4 
assumes  the  network  shown  in  Figure  2,  i.e.,  there  are  ten  links  (e  =  10)  and  ten  pairs  of 

nodes.  Assuming  five  paths  per  pair  of  nodes  yields  the  parameter  s  =  50  =  10-5  .  Based  on 
Eqn.  (12),  the  lower  bound  for  the  probability  of  finding  the  optimal  solution  is: 


P  - 


l-(l-fl)- 


(C-2) 

(C-l)2 


2c 


X  60 


y 


Figure  4:  Lower  Bound  to  the  Probability  of  Achieving  the  Optimal  Solution  as  a  Function  of 

Number  of  Runs. 
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4.10  Discussion 

The  reader  might  wonder  what  happens  when  a  is  equal  to  zero.  If  it  was  the  case,  then 


P  - 


l-(l-a)- 


(C~  2) 

(c-i)2 


\  60 


=  1 


Qc  =« 
hence 
|0c|  =  O 


for  all  n  ,  which  implies  that  we  would  find  the  optimal  state  at  the  first  iteration,  i.e.,  when  n 
is  equal  to  one.  This  seems  paradoxical.  The  answer  lies  in  the  fact  that  n  is  the  number  of 
iterations  that  will  produce  feasible  solutions.  As  a  result,  when  Qc  =  0 ,  there  is  no  feasible 
solution  because  the  probability  of  transition  from  a  feasible  state  to  another  feasible  state  as 
dictated  by  the  matrix  Qc  is  zero. 


Even  though,  in  this  report,  a  specific  type  of  Markov  matrix  Qc  (as  shown  in  Eqn(4))  was 
chosen,  it  is  believed  that  the  methodology  developed  here  can  be  extended  to  a  more  general 
class  of  Markov  processes.  For  example,  transitions  do  not  necessarily  have  to  be  among 
nearest  neighbours.  State  1  can  transition  to  state  3  without  going  through  state  2.  This 
means  Qc  can  be  more  general  as  shown  below: 


^0 

0 

0 

0 

0" 

0 

«u 

a\,2 

a\,2 

0 

II 

Cn 

0 

a2,\ 

a22 

a2,3 

0 

0 

a2,l 

a2,2 

a22i 

0 

^0 

0 

0 

0 

0, 

5  Conclusion 


This  paper  described  a  new  agent-based  algorithm  that  optimizes  an  objective  function 
depending  on  both  the  flow  and  the  capacity  of  each  link,  and  that  satisfies  the  traffic  matrix 
constraint.  In  addition,  a  novel  rate  of  convergence  was  derived  for  this  algorithm  when 
assuming  no  reinforcement  learning.  It  is  believed  that  the  rate  of  convergence,  when 
reinforcement  learning  is  imposed,  can  be  further  derived. 
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Glossary 


Technical  term 
A 

Yuv 

C 

ci 

d  i 

Dl 


Dp 

^ uvlk 


Explanation  of  term 

Difference  between  the  matrix  P  and  the  matrix  Q 
Minimal  traffic  requirement  between  node  u  and  node  v 
Maximal  capacity  of  the  network 
Capacity  of  link  / 

The  ith  column  of  the  matrix  Q" 

Probability  that  the  current  capacity  j  of  link  i  should  be 
decreased 

Probability  that  the  path  l  that  carries  k  flows  between  node  u  and 
node  v  should  be  decreased 


g"  The  ith  column  of  the  matrix  Q“+1 

f  Flow  through  a  link 

/'.  Probability  that  the  current  capacity  j  of  link  i  should  be 

increased 

/''  Probability  that  the  path  /  that  carries  k  flows  between  node  u  and 

node  v  should  be  increased 

P  Transition  matrix  from  i  units  of  flow  to  /  units  of  flow 

y  J 

Qy  Same  as  P  with  the  exceptions  that  the  first  row,  first  column, 

last  row  and  last  column  are  set  to  zeroes 

S'.  Probability  that  the  current  capacity  j  of  link  /  should  be 

maintained 

Spvlk  Probability  that  the  path  l  that  carries  k  flows  between  node  u  and 

node  v  should  be  maintained 
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